30 - Pattern Recognition [PR] - PR 26 [ID:23541]
50 von 105 angezeigt

Welcome back everybody to pattern recognition. So today we want to explore a bit more this

concept of kernels and we will look into something that is called the Merses theorem. So looking

forward to exploring kernel spaces.

So let's have a look into kernels. So far we've seen that the linear decision boundaries in

their current form have serious limitations. It's too simple to provide good decision boundaries.

Non-linearly separable data cannot be classified. Noisy data cause problems and also this formulation

allows you to work with vectorial data only. So one possible solution that we already hinted at

is mapping into a higher dimensional space using a non-linear feature transform and then use a

linear classifier. So we've seen that the SVM decision boundary can be rewritten in dual form

and then we could see that we essentially got rid of the actual normal vector and everything

could be written as sum over the Lagrange multipliers, the class observations and the actual

features and in this particular optimization problem we've seen that the feature vectors only

appear as inner products. So the conclusion is that we only have inner products here so this

can be applied in both the learning and the classification phase. Now let's look again at

this inner product in a bit of more detail. So we've seen that this was also the case in the

perceptron and there we've seen that essentially we were summing up over all the steps during the

training procedure and we already have seen that also in this case we essentially only had inner

products for the decision boundary and we've seen that we only need the observations that actually

produced updates of our decision boundary during the training process. So this is the set E here

if you remember. So again everything can be brought down to inner products. We can use feature

transforms now and this feature transform phi is mapping from space D dimensional to a capital D

dimensional space and now the capital D is larger or equal to D such that the resulting features are

linearly separable. So let's look into one example here we have some original feature space that is

centered around zero and all the observations from one class are in the center and all the

observations from the outer class are outside aligned and it's very clear that this example

cannot be solved with a linear decision boundary but now I take the feature transform phi equals

to x1 square and x2 square so this has exactly the same dimensionality and what we see is by

mapping onto essentially these squared dimensions we can now find a linear decision boundary.

Now this is a simple example it may be more difficult and it already gets difficult

if we start using data that is not centered. In this case if we apply our feature transform

we see that we yeah unfortunately we are not able to separate these data linearly but if we use a

3D transform that still includes for example x2 then you see that we can again find a linear decision

boundary. So this is again the idea of using a polynomial feature transform in order to map to

a higher dimensional space that then will allow us to use the linear decision boundaries.

So we've seen that the decision boundary given by quadratic function is obviously not linear

but because the parameters in A are linear we can map this to this high dimensional feature space

in order to get linear decision boundaries in this transformed high dimensional space.

Now let's consider the distances in the transformed space so I apply phi and I apply to some x and

some x prime and take the two norm. If we scratch this out then you see this is the inner product

of the two differences and then we can essentially write this up and you see that I get essentially

only inner products. So the two norm of our transformed spaces can be written only by using

inner products. So with this kind of feature transform we can even evaluate distances only

by the means of inner products. So this then also means that they can very easily be incorporated

into our support vector machine so the decision boundary is then given as the feature transformed

vectors using the inner product and also the optimization problem can be rewritten with this

inner product so we can integrate this quite easily also into the support vector machine.

This then brings us to the notion of a kernel function a kernel function that is mapping from

two feature domains that are both identical of course x to some value r that is a real value

and it needs to be a symmetric function that maps pairs of features to real numbers

and in this case then the property holds that our k is given as the inner product of the feature

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:14:02 Min

Aufnahmedatum

2020-11-11

Hochgeladen am

2020-11-11 17:18:43

Sprache

en-US

In this video, we look at kernels for Support Vector Machines and the Perceptron and learn about Mercer's Theorem.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen